This workflow displays a variety of interactive Plotly visualizations to explore the mouse melanoma data published in the recent Pozniak, et al 2022. Plotly visualizations found in this workflow are also found in the accompanying R Shiny App.
library("dplyr")
library("tibble")
library("ggplot2")
library("ggpubr")
library("GSVA")
library("readr")
library("heatmaply")
library("plotly")
The aggregated NRAS13_all_43k cell dataset and test gene
set consisting of two gene files are loaded:
## read aggregated count human malignant dataset
df1 <- readRDS(file = "./counts/NRAS13_all_43k_agg_counts.rds")
# df2 <- readRDS(file = "./counts/NRAS13_malign_preprint_agg_counts.rds")
# head(df1)
This workflow is primarily designed with the intention to correlate two gene sets. Here we load such a data file with two toy gene sets. It is possible to upload a custom user-generated gene set as long as the file is a CSV or TSV file and the gene sets are the first and second columns. This workflow ignores additional columns, and these columns may be named whatever the end user wishes.
## read gene sets of interest to assay in the dataset
g1 <- readr::read_csv("./counts/mouse_test_genes.csv")
## display gene lists as data table
g1 %>%
DT::datatable()
This block of code puts each row of the test gene set into a list
with each element in the list given the same name as the original
dataframe columns, however NA values are removed from the
list elements.
gl <- list()
for (i in 1:ncol(g1)) {
g1 %>%
dplyr::select(i) %>%
dplyr::pull() -> gl[[i]]
gl[[i]] <- gl[[i]][!is.na(gl[[i]])]
}
names(gl) <- names(g1)
The app applies GSVA on the gene sets found in the test gene file. This is the most time consuming computational step in the app.
## run ssGSEA
GSVA::gsva(expr = as.matrix(df1 %>% dplyr::select(-gene) %>% as.data.frame()),
gset.idx.list = gl) -> gsva_sig
## Estimating GSVA scores for 2 gene sets.
## Estimating ECDFs with Gaussian kernels
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
The dataframe resulting from the GSVA step is converted to a long dataframe and the information contained in the aggregated cell names is used to re-label the aggregated cell types by sample.
## convert data for violin Plot
gsva_sig %>%
as.data.frame() %>%
tibble::rownames_to_column() %>%
dplyr::rename(gene_sig = names(.)[1]) %>%
tidyr::gather(key = "cell_name", value = "expression", -gene_sig) %>%
dplyr::mutate(sample = gsub("^[^_]*.", "",
gsub("_[^_]+$", "", cell_name))) %>%
dplyr::mutate(cell_type = dplyr::case_when(
grepl("B-cell", cell_name) ~ "B-cell",
grepl("CAF", cell_name) ~ "CAF",
grepl("DC", cell_name) ~ "DC",
grepl("EC", cell_name) ~ "EC",
grepl("Malignant", cell_name) ~ "Malignant",
grepl("Monocyte", cell_name) ~ "Monocyte/macrophage",
grepl("Pericyte", cell_name) ~ "Pericyte",
grepl("NKcell", cell_name) ~ "T/NKcell")) %>%
as.data.frame() -> a1
This plot displays gene expression by cell type for each gene signature. Due to UI space constraints in the app, we will probably be limited to displaying only the first two gene sets.
a1 %>%
ggplot(aes(x = cell_type, y = expression, fill = cell_type)) +
geom_violin(alpha = 0.8) +
geom_point(position = position_jitter(seed = 1, width = 0.2),
size = 0.4, alpha = 0.8) +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle("Violin plot of gene signature scores by cell type") +
facet_grid(~gene_sig) -> p1
ggplotly(p1)
# p1
This block of code selects all of the genes found in the text file, extracts the counts for each sample and averages them for these genes. The gene counts have been transformed to log scale in order for highly expressed genes not skew the gradient scale and drown out subtle changes in expression.
df1 %>%
dplyr::filter(gene %in% unlist(g1)[!is.na(unlist(g1))]) %>%
tidyr::gather(key = "cell_name", value = "expression", -gene) %>%
dplyr::mutate(sample = gsub("^[^_]*.", "",
gsub("_[^_]+$", "", cell_name))) %>%
dplyr::mutate(cell_type = dplyr::case_when(
grepl("B-cell", cell_name) ~ "B-cell",
grepl("CAF", cell_name) ~ "CAF",
grepl("DC", cell_name) ~ "DC",
grepl("EC", cell_name) ~ "EC",
grepl("Malignant", cell_name) ~ "Malignant",
grepl("Monocyte", cell_name) ~ "Monocyte/macrophage",
grepl("Pericyte", cell_name) ~ "Pericyte",
grepl("NKcell", cell_name) ~ "T/NKcell")) %>%
dplyr::group_by(gene, cell_type) %>%
dplyr::summarise(mean_exp = mean(expression)) %>%
tidyr::spread(key = "cell_type", value = "mean_exp") %>%
tibble::column_to_rownames(var = "gene") %>%
dplyr::mutate_all(log) %>%
as.data.frame() -> r1
is.na(r1) <- sapply(r1, is.infinite)
r1[is.na(r1)] <- 0
heatmaply::heatmaply(
x = r1,
main = "Heatmap of log gene counts",
xlab = "Cell types",
ylab = "Genes of interest",
scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
low = "blue",
high = "firebrick"))
df1 %>%
dplyr::filter(gene %in% unlist(g1)[!is.na(unlist(g1))]) %>%
tidyr::gather(key = "cell_name", value = "expression", -gene) %>%
dplyr::mutate(sample = gsub("^[^_]*.", "",
gsub("_[^_]+$", "", cell_name))) %>%
dplyr::mutate(cell_type = dplyr::case_when(
grepl("B-cell", cell_name) ~ "B-cell",
grepl("CAF", cell_name) ~ "CAF",
grepl("DC", cell_name) ~ "DC",
grepl("EC", cell_name) ~ "EC",
grepl("Malignant", cell_name) ~ "Malignant",
grepl("Monocyte", cell_name) ~ "Monocyte/macrophage",
grepl("Pericyte", cell_name) ~ "Pericyte",
grepl("NKcell", cell_name) ~ "T/NKcell")) %>%
dplyr::group_by(gene, cell_type) %>%
dplyr::summarise(mean_exp = mean(expression)) %>%
tidyr::spread(key = "cell_type", value = "mean_exp") %>%
tibble::column_to_rownames(var = "gene") %>%
dplyr::mutate_all(log) %>%
as.data.frame() -> r2
is.na(r2) <- sapply(r2, is.infinite)
r2[is.na(r2)] <- 0
r2 %>%
t() %>%
cor() %>%
heatmaply::heatmaply(
main = "Correlation Heatmap of log gene counts across samples",
xlab = "Cell types by time point",
ylab = "Genes of interest",
scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
low = "blue",
high = "firebrick"))
r2 %>%
cor() %>%
heatmaply::heatmaply(
main = "Correlation Heatmap of log gene counts across samples",
xlab = "Cell types by time point",
ylab = "Genes of interest",
scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
low = "blue",
high = "firebrick"))
The following correlation plots display how each gene set correlate each gene set with each other across cell types and time points. The correlation for each group is reported by displaying the R value and p-value.
gsva_sig %>%
t() %>% as.data.frame() %>%
tibble::rownames_to_column() %>%
dplyr::rename(sample_name = names(.)[1]) %>%
dplyr::mutate(sample = gsub("^[^_]*.", "",
gsub("_[^_]+$", "", sample_name))) %>%
dplyr::mutate(cell_type = dplyr::case_when(
grepl("B-cell", sample_name) ~ "B-cell",
grepl("CAF", sample_name) ~ "CAF",
grepl("DC", sample_name) ~ "DC",
grepl("EC", sample_name) ~ "EC",
grepl("Malignant", sample_name) ~ "Malignant",
grepl("Monocyte", sample_name) ~ "Monocyte/macrophage",
grepl("Pericyte", sample_name) ~ "Pericyte",
grepl("NKcell", sample_name) ~ "T/NKcell")) %>%
as.data.frame() -> d1
d1 %>%
dplyr::group_by(sample) %>%
dplyr::summarise(mean_gene1 = mean(gene1),
mean_gene2 = mean(gene2)) %>%
ggplot(aes(x = mean_gene1, y = mean_gene2)) +
geom_point(size = 1, color = "black") +
stat_smooth(method = "lm", se = TRUE, fill = "gray", color = "darkgray",
formula = y ~ poly(x, 1, raw = TRUE)) +
ggpubr::stat_cor(method = "spearman", label.x = 0) +
theme_classic() +
xlab("Signature Score 1") +
ylab("Signature Score 2") +
theme(axis.title = element_text(size = 14),
legend.position = "none") +
ggtitle(paste0("Correlation between signature scores across cell types")
) -> p0
ggplotly(p0)
d1 %>%
dplyr::filter(!grepl("Patient", cell_type)) %>%
ggplot(aes(x = gene1, y = gene2)) +
geom_point(size = 1, aes(color = cell_type)) +
stat_smooth(method = "lm", se = TRUE, fill = "gray", color = "darkgray",
formula = y ~ poly(x, 1, raw = TRUE)) +
ggpubr::stat_cor(method = "spearman", label.x = 0) +
theme_classic() +
xlab("Signature Score 1") +
ylab("Signature Score 2") +
theme(axis.title = element_text(size = 14),
legend.position = "none") +
ggtitle(paste0("Correlation between signature scores across cell types")) +
facet_grid(cols = vars(cell_type)) -> p4
# p4
ggplotly(p4)
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.2 (2022-10-31)
## os macOS Big Sur ... 10.16
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Europe/Brussels
## date 2022-12-15
## pandoc 2.19.2 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] CRAN (R 4.2.0)
## annotate 1.74.0 2022-04-26 [1] Bioconductor
## AnnotationDbi 1.58.0 2022-04-26 [1] Bioconductor
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
## backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)
## beachmat 2.12.0 2022-04-26 [1] Bioconductor
## Biobase 2.56.0 2022-04-26 [1] Bioconductor
## BiocGenerics 0.42.0 2022-04-26 [1] Bioconductor
## BiocParallel 1.30.4 2022-10-11 [1] Bioconductor
## BiocSingular 1.12.0 2022-04-26 [1] Bioconductor
## Biostrings 2.64.1 2022-08-18 [1] Bioconductor
## bit 4.0.5 2022-11-15 [1] CRAN (R 4.2.2)
## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0)
## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.2.0)
## blob 1.2.3 2022-04-10 [1] CRAN (R 4.2.0)
## broom 1.0.1 2022-08-29 [1] CRAN (R 4.2.0)
## bslib 0.4.1 2022-11-02 [1] CRAN (R 4.2.2)
## ca 0.71.1 2020-01-24 [1] CRAN (R 4.2.0)
## cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)
## car 3.1-1 2022-10-19 [1] CRAN (R 4.2.1)
## carData 3.0-5 2022-01-06 [1] CRAN (R 4.2.0)
## cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.0)
## codetools 0.2-18 2020-11-04 [1] CRAN (R 4.2.2)
## colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)
## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.1)
## crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.2.0)
## data.table 1.14.6 2022-11-16 [1] CRAN (R 4.2.2)
## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0)
## DelayedArray 0.22.0 2022-04-26 [1] Bioconductor
## DelayedMatrixStats 1.18.2 2022-10-13 [1] Bioconductor
## dendextend 1.16.0 2022-07-04 [1] CRAN (R 4.2.0)
## digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
## dplyr * 1.0.10 2022-09-01 [1] CRAN (R 4.2.0)
## DT 0.26 2022-10-19 [1] CRAN (R 4.2.1)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
## evaluate 0.19 2022-12-13 [1] CRAN (R 4.2.2)
## fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
## farver 2.1.1 2022-07-06 [1] CRAN (R 4.2.0)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
## foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.0)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
## GenomeInfoDb 1.32.4 2022-09-06 [1] Bioconductor
## GenomeInfoDbData 1.2.8 2022-04-25 [1] Bioconductor
## GenomicRanges 1.48.0 2022-04-26 [1] Bioconductor
## ggplot2 * 3.4.0 2022-11-04 [1] CRAN (R 4.2.2)
## ggpubr * 0.5.0 2022-11-16 [1] CRAN (R 4.2.2)
## ggsignif 0.6.4 2022-10-13 [1] CRAN (R 4.2.1)
## glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
## graph 1.74.0 2022-04-26 [1] Bioconductor
## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.0)
## GSEABase 1.58.0 2022-04-26 [1] Bioconductor
## GSVA * 1.44.5 2022-09-22 [1] Bioconductor
## gtable 0.3.1 2022-09-01 [1] CRAN (R 4.2.0)
## HDF5Array 1.24.2 2022-08-02 [1] Bioconductor
## heatmaply * 1.4.0 2022-10-08 [1] CRAN (R 4.2.1)
## hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.0)
## htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.2.0)
## httr 1.4.4 2022-08-17 [1] CRAN (R 4.2.0)
## IRanges 2.30.1 2022-08-18 [1] Bioconductor
## irlba 2.3.5.1 2022-10-03 [1] CRAN (R 4.2.0)
## iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
## jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.2)
## KEGGREST 1.36.3 2022-07-12 [1] Bioconductor
## knitr 1.41 2022-11-18 [1] CRAN (R 4.2.0)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.2.0)
## lattice 0.20-45 2021-09-22 [1] CRAN (R 4.2.2)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.2.0)
## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
## Matrix 1.5-3 2022-11-11 [1] CRAN (R 4.2.0)
## MatrixGenerics 1.8.1 2022-06-26 [1] Bioconductor
## matrixStats 0.63.0 2022-11-18 [1] CRAN (R 4.2.0)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
## mgcv 1.8-41 2022-10-21 [1] CRAN (R 4.2.2)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
## nlme 3.1-160 2022-10-10 [1] CRAN (R 4.2.2)
## pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
## plotly * 4.10.1 2022-11-07 [1] CRAN (R 4.2.2)
## plyr 1.8.8 2022-11-11 [1] CRAN (R 4.2.0)
## png 0.1-8 2022-11-29 [1] CRAN (R 4.2.2)
## purrr 0.3.5 2022-10-06 [1] CRAN (R 4.2.1)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.2.0)
## Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.0)
## RCurl 1.98-1.9 2022-10-03 [1] CRAN (R 4.2.1)
## readr * 2.1.3 2022-10-01 [1] CRAN (R 4.2.1)
## registry 0.5-1 2019-03-05 [1] CRAN (R 4.2.0)
## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.2.0)
## rhdf5 2.40.0 2022-04-26 [1] Bioconductor
## rhdf5filters 1.8.0 2022-04-26 [1] Bioconductor
## Rhdf5lib 1.18.2 2022-05-15 [1] Bioconductor
## rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0)
## rmarkdown 2.18 2022-11-09 [1] CRAN (R 4.2.0)
## RSQLite 2.2.19 2022-11-24 [1] CRAN (R 4.2.2)
## rstatix 0.7.1 2022-11-09 [1] CRAN (R 4.2.0)
## rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0)
## rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.2.0)
## S4Vectors 0.34.0 2022-04-26 [1] Bioconductor
## sass 0.4.4 2022-11-24 [1] CRAN (R 4.2.0)
## ScaledMatrix 1.4.1 2022-09-11 [1] Bioconductor
## scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.0)
## seriation 1.4.0 2022-10-21 [1] CRAN (R 4.2.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
## SingleCellExperiment 1.18.1 2022-10-02 [1] Bioconductor
## sparseMatrixStats 1.8.0 2022-04-26 [1] Bioconductor
## stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.0)
## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
## SummarizedExperiment 1.26.1 2022-05-01 [1] Bioconductor
## tibble * 3.1.8 2022-07-22 [1] CRAN (R 4.2.0)
## tidyr 1.2.1 2022-09-08 [1] CRAN (R 4.2.0)
## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
## TSP 1.2-1 2022-07-14 [1] CRAN (R 4.2.0)
## tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
## vctrs 0.5.1 2022-11-16 [1] CRAN (R 4.2.2)
## viridis * 0.6.2 2021-10-13 [1] CRAN (R 4.2.0)
## viridisLite * 0.4.1 2022-08-22 [1] CRAN (R 4.2.0)
## vroom 1.6.0 2022-09-30 [1] CRAN (R 4.2.0)
## webshot 0.5.4 2022-09-26 [1] CRAN (R 4.2.1)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
## xfun 0.35 2022-11-16 [1] CRAN (R 4.2.2)
## XML 3.99-0.13 2022-12-04 [1] CRAN (R 4.2.2)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0)
## XVector 0.36.0 2022-04-26 [1] Bioconductor
## yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.1)
## zlibbioc 1.42.0 2022-04-26 [1] Bioconductor
##
## [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
##
## ──────────────────────────────────────────────────────────────────────────────